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THE PENALIZED PROFILE SAMPLER 

By Guang Cheng^ and Michael R. Kosorok!^ 

Duke University and University of North Carolina at Chapel Hill 
The penalized profile sampler for semiparametric inference is an 
extension of the profile sampler method [8J obtained by profiling a 
penalized log-likelihood. The idea is to base inference on the pos- 
terior distribution obtained by multiplying a profiled penalized log- 
likelihood by a prior for the parametric component, where the pro- 
filing and penalization are applied to the nuisance parameter. Be- 
cause the prior is not applied to the full likelihood, the method is not 
strictly Bayesian. A benefit of this approximately Bayesian method 
is that it circumvents the need to put a prior on the possibly infinite- 
dimensional nuisance components of the model. We investigate the 
first and second order frequentist performance of the penalized pro- 
file sampler, and demonstrate that the accuracy of the procedure can 
be adjusted by the size of the assigned smoothing parameter. The 
theoretical validity of the procedure is illustrated for two examples: 
a partly linear model with normal error for current status data and 
a semiparametric logistic regression model. As far as we are aware, 
there are no other methods of inference in this context known to have 
second order frequentist validity. 
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2 G. CHENG AND M. R. KOSOROK 

1. Introduction. Semiparametric models are statistical models indexed 
by both a finite dimensional parameter of interest 6 and an infinite dimen- 
sional nuisance parameter rj. The profile likelihood is typically defined as 

pl n (9) = sup lik n (6,r]), 

where lik n {9,rf) is the likelihood of the semiparametric model given n ob- 
servations and H is the parameter space for 77. We also define 

7)e = argmax v£H lik n (e,r]). 

The convergence rate of the nuisance parameter rj is the order of d(j]g 
where •) is some metric on r/, 8 n is any sequence satisfying 8 n = 6>o+op(l), 
and 770 is the true value of 77. Typically, 

(1) d(fj §n ,rjo)=Op(\\e n -e \\+n~ r ), 

where || • || is the Euclidean norm and r > 1/4. Of course, a smaller value of 
r leads to a slower convergence rate of the nuisance parameter. For instance, 
the nuisance parameter in the Cox proportional hazards model with right 
censored data, the cumulative hazard function, has the parametric rate, i.e., 
r = 1/2. If current status data is applied to the Cox model instead, then the 
convergence rate will be slower, with r = 1/3, due to the loss of information 
provided by this kind of data. 

The profile sampler is the procedure of sampling from the posterior of the 
profile likelihood in order to estimate and draw inference on the parametric 
component 9 in a semiparametric model, where the profiling is done over the 
possibly infinite-dimensional nuisance parameter r\. show that the profile 
sampler gives a first order correct approximation to the maximum likelihood 
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THE PENALIZED PROFILE SAMPLER 3 

estimator 9 n and consistent estimation of the efficient Fisher information for 
9 even when the nuisance parameter is not estimable at the y/n rate. Another 
Bayesian procedure employed to do semiparametric estimation is considered 



m 



161 ] who study the marginal semiparametric posterior distribution for a 



parameter of interest. In particular, 



161 ] show that marginal semiparametric 



posterior distributions are asymptotically normal and centered at the corre- 
sponding maximum likelihood estimates or posterior means, with covariance 
matrix equal to the inverse of the Fisher information. Unfortunately, this 
fully Bayesian method requires specification of a prior on rj, which is quite 
challenging since for some models there is no direct extension of the concept 
of a Lebesgue dominating measure for the infinite-dimensional parameter set 
involved 7| • The advantages of the profile sampler for estimating 9 compared 
to other methods is discussed extensively in [2], and 8]. 

In many semiparametric models involving a smooth nuisance parameter, 
it is often convenient and beneficial to perform estimation using penaliza- 
tion. One motivation for this is that, in the absence of any restrictions on 
the form of the function r], maximum likelihood estimation for some semi- 
parametric models leads to over-fitting. Seminal applications of penalized 
maximum likelihood estimation include estimation of a probability density 
function in \l\ and nonpar ametric linear regression in 18]. Note that penal- 
ized likelihood is a special case of penalized quasi-likelihood studied in 
Under certain reasonable regularity conditions, penalized semiparametric 
log-likelihood estimation can yield fully efficient estimates for 9 (see, for ex- 
ample, [3]). As far as we are aware, the only general procedure for inference 
for 9 in this context known to be theoretically valid is a weighted bootstrap 
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4 G. CHENG AND M. R. KOSOROK 

with bounded random weights (see [id]). It is even unclear whether the 
usual nonparametric bootstrap will work in this context when the nuisance 
parameter has a convergence rate r < 1/2. 

In contrast, and have shown that the profile sampler procedure with- 
out penalization can essentially yield second order frequentist valid inference 
for 9 in semiparametric models, where the estimation accuracy is dependent 
on the convergence rate of the nuisance parameter. In other words, a faster 
convergence rate of the nuisance parameters can yield more precise frequen- 
tist inference for 9. These second order results are verified in Q and H 
for several examples, including the Cox model for both right censored and 
current status data, the proportional odds model, case-control studies with 
missing covariates, and the partly linear normal model. The convergence 
rates for these models range from the parametric to the cubic. The work in 

has shown clearly that the accuracy of the inference for 9 based on the 
profile sampler method is intrinsically determined by the semiparametric 
model specifications through its entropy number. 

The purpose of this paper is to ask the somewhat natural question: does 
sampling from a profiled penalized log-likelihood (which process we refer 
hereafter to as the penalized profile sampler) yield first and even second 
order accurate frequentist inference? The conclusion of this paper is that 
the answer is yes and, moreover, the accuracy of the inference depends in a 
fairly simple way on the size of the smoothing parameter. 

The unknown parameters in the semiparametric models we study in this 
paper includes 9, which we assume belongs to some compact set C M. d , 
and 77, which we assume to be a function in the Sobolev class of functions 
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THE PENALIZED PROFILE SAMPLER 5 

supported on some compact set on the real line, whose k-th derivative exists 
and is absolutely continuous with J(rj) < do, where 

J\ v ) = J z (r,W(z)) 2 dz. 

Here A: is a fixed, positive integer and ryi) is the j-th derivative of rj with 
respect to z. Obviously J 2 (f?) is some measurement of complexity of r\. We 
denote Tik as the Sobolev function class with degree k. The penalized log- 
likelihood in this context is: 

(2) logKfcAjM) = loglik(e, V )-\ 2 n J 2 (r)), 

where log lik(9, rf) = ¥ n £g }V (X), Iq^{X) is the log-likelihood of the single ob- 
servation X, and A n is a smoothing parameter, possibly dependent on data. 
In practice, A n can be obtained by cross-validation [zj or by inspecting the 
various curves for different values of A n . The penalized maximum likelihood 
estimators 9 n and r) n depend on the choice of the smoothing parameter A n . 
Consequently we use the notation 9\ n and f)\ n for the remainder of this pa- 
per to denote the estimators obtained from maximizing (|2|). In particular, a 
larger smoothing parameter usually leads to a less rough penalized estimator 
of 770- 

For the purpose of establishing first order accuracy of inference for 9 
based on the penalized profile sampler, we assume that the bounds for the 
smoothing parameter are in the form below: 

(3) A n = op^" 1 / 4 ) and A" 1 = P {n k ^ 2k+1 ^). 

The condition ([3]) is assumed to hold throughout this paper. One way to 
ensure ([3]) in practice is simply to set A n = n~ k ^ 2k+1 \ Or we can just choose 
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6 G. CHENG AND M. R. KOSOROK 

A n = n -1 / 3 which is independent of k. It turns out that the upper bound 
guarantees that 9\ n is y^n-consistent, while the lower bound controls the 
penalized nuisance parameter estimator convergence rate. Another approach 
to controlling estimators is to use sieve estimates with assumptions on the 
derivatives (see Q]). We will not pursue this further here. 
The log-profile penalized likelihood is defined as follows: 

(4) \ogphM = io g iik(e,fje,x n ) - \l J 2 (m,x n ), 

where f)o : \ n is argmax ve .-H k log lik\ n (9, rj) for fixed 9 and A n . The penalized 
profile sampler is just the procedure of sampling from the posterior distri- 
bution of pl\„(0) by assigning a prior on 9. By analyzing the corresponding 
MCMC chain from the frequentist's point of view, our paper obtains the 
following conclusions: 

1 Distribution Approximation: The posterior distribution with respect to 
pl\ n (9) can be approximated by the normal distribution with mean the 
maximum penalized likelihood estimator of 9 and variance the inverse 
of the efficient information matrix, with error Op(n 1//2 A^); 

2 Moment Approximation: The maximum penalized likelihood estimator 
of 9 can be approximated by the mean of the MCMC chain with error 
Op(A^). The efficient information matrix can be approximated by the 
inverse of the variance of the MCMC chain with error Op(n 1 / 2 A 2 ); 

3 Confidence Interval Approximation: An exact frequentist confidence 
interval of Wald's type for 9 can be estimated by the credible set 
obtained from the MCMC chain with error Op (A 2 ). 

Obviously, given any smoothing parameter satisfying the upper bound 
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THE PENALIZED PROFILE SAMPLER 7 

in ([3]), the penalized profile sampler can yield first order frequentist valid 
inference for 8, similar as to what was shown for the profile sampler in 
8J. Moreover, the above conclusions are actually second order frequentist 
valid results, whose approximation accuracy is directly controlled by the 
smoothing parameter. Note that the corresponding results for the usual 
(non-penalized) profile sampler with nuisance parameter convergence rate r 
in are obtained by replacing in the above Op(n 1 / 2 A^) with Op(n -1 / 2 V 
n~ r+1 / 2 ) and Op(A 2 ) with Op(n _1 V n~ r ), for all respective occur where r 
is as defined in (P). 

Our results are the first higher order frequentist inference results for pe- 
nalized semiparametric estimation. The layout of the article is as follows. 
The next section, section 2, introduces the two main examples we will be 
using for illustration: partly linear regression for current status data and 
semiparametric logistic regression. Some background is given in section 3, 
including the concept of a least favorable submodel as well as some nota- 
tions and the main model assumptions. In section[H some preliminary results 
are developed, including three rather different theorems concerning the con- 
vergence rates of the penalized nuisance parameters and the order of the 
estimated penalty term under different conditions. The corresponding rates 
for the two featured examples are also calculated in this section. The main 
results and implications are discussed in section 5, and all remaining model 
assumptions are verified for the examples in section 6. A brief discussion of 
future work is given in section 7. We postpone all technical tools and proofs 
to the last section, section 8. 

2. Examples. 
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8 G. CHENG AND M. R. KOSOROK 

2.1. Partly Linear Normal Model with Current Status Data. In this ex- 
ample, we study the partly linear regression model with normal residue error. 
The continuous outcome Y, conditional on the covariates (U, V) £ M. d x R, 
is modeled as 

(5) Y = 9 T U + f(V) + e, 

where / is an unknown smooth function, and e ~ N(0, a 2 ) with finite vari- 
ance a 2 . For simplicity, we assume for the rest of the paper that a = 1. The 
theory we propose also works when a is unknown, but the added complexity 
would detract from the main issues. We also assume that only the current 
status of response Y is observed at a random censoring time C £ R. In 
other words, we observe X = (C, A, U, V), where indicator A = 1{Y < C}. 
Current status data may occur due to study design or measurement limita- 
tions. Examples of such data arise in several fields, including demography, 
epidemiology and econometrics. For simplicity of exposition, 9 is assumed 
to be one dimensional. 

Under the model ([5]) and given that the joint distribution for (C, U, V) 
does not involve parameters (9, /), the log-likelihood for a single observation 
at X = x = (c, 5, u, v) is 

loglike j(x) = 51og{$ (c - 9u — /(«))} 

(6) +(1 - *) log {l-*(c-0w -/(«))}, 

where <3? is the standard normal distribution. The parameter of interest, 9, 
is assumed to belong to some compact set in M . The nuisance parameter 
is the function /, which belongs to the Sobolev function class of degree k. 
We further make the following assumptions on this model. We assume that 
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THE PENALIZED PROFILE SAMPLER 9 

(Y,C) is independent given (U,V). The covariates (U,V) are assumed to 
belong to some compact set, and the support for random censoring time C 
is an interval [l c , u c ], where — oo < l c < u c < oo. In addition, EVar(U\V) 
is strictly positive and Ef(V) = 0. The first order asymptotic behaviors of 



the penalized log-likelihood estimates of a s 



ightly more general version of 



this model have been extensively studied in [9|. 

2.2. Semiparametric Logistic Regression. Let X\ = (Y\,W\,Z\), X2 = 
(Y2, W2, Z2), ... be independent copies of X = (Y, W, Z), where Y is a di- 
chotomous variable with conditional expectation E(Y\W, Z) = F(6 T W + 
r](Z)). F(u) is the logistic distribution defined as e u / (e u + 1). Obviously the 
likelihood for a single observation is of the following form: 

(7) PB , n {x) = F(9 T w + V (z))y(l - F(9 T w + ^z))) 1 ^ fW z \w, z). 

This example is a special case of quasi-likelihood in partly linear models 
when the conditional variance of response Y is taken to have some quadratic 
form of the conditional mean of Y. In the absence of any restrictions on the 
form of the function 77, the maximum likelihood of this simple model often 
leads to over-fitting. Hence J] propose maximizing instead the penalized 
likelihood of the form log lik(9, rj) — A^J 2 (r/); and 12| studied the asymp- 
totic properties of the maximum penalized likelihood estimators for 6 and 
77. For simplicity, we will restrict ourselves to the case where C M 1 and 
(W, Z) have bounded support, say [0, l] 2 . To ensure the identifiability of the 
parameters, we assume that EVar(W\Z) is positive and that the support 
of Z contains at least k distinct points in [0, 1]. 

Remark 1. Another interesting potential example we may apply the 
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10 G. CHENG AND M. R. KOSOROK 

penalized profile sampler method to is the classic proportional hazards model 
with current status data by penalizing the cumulative hazard function with its 
Sobolev norm. There are two motivations for us to penalize the cumulative 
hazard function in the Cox model. One is that the estimated step functions 
from the unpenalized estimation cannot be used easily for other estimation 
or inference purposes. Another issue with the unpenalized approach is that 
without making stronger continuity assumptions, we cannot achieve uniform 
consistency even on a compact set The asymptotic properties of the 
corresponding penalized M-estimators have been studied in jij /. 

3. Preliminaries. In this section, we present some necessary prelimi- 
nary material concerning least favorable submodels, general notational con- 
ventions for the paper, and an enumeration of the main assumptions. 

3.1. Least favorable submodels. In this subsection, we briefly review the 
concept of a least favorable submodel. A submodel t \— > pt tVt is defined to be 
least favorable at (8,n) if Iq^ = d jdt log pt,-q t i given t = 6, where £g tTj is the 
efficient score function for 8. The efficient score function for 9 can be viewed 
as the projection of the score function for 8 onto the tangent space of rj. 
The inverse of its variance is exactly the efficient information matrix Ig t „. 
We abbreviate hereafter £o ,r) and Ie ,n Q with £q and Io, respectively. The 
"direction" along which rjt approaches r\ in the least favorable submodel is 
called the least favorable direction. An insightful review about least favorable 
submodels and efficient score functions can be found in Chapter 3 of 6j. By 
the above construction of the least favorable submodel, logpl\ n (9) can be 
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THE PENALIZED PROFILE SAMPLER 11 
rewritten in the following form: 

(8) logpl Xn (0) = *(M,3mJ- Ww(M*,xJ)> 

where £(t, 9, rj){x) = log lik(t, rjt(9, r/))(x), t rjt(9, rj) is a general map from 
the neighborhood of 9 into the parameter set for rj, with rjg(9,r]) = rj. The 
concrete forms of §8$) will depend on the situation. 

3.2. Notation. We present in this subsection some notation that will be 
used throughout the paper. The derivatives of the function £(t, 9, rj) are with 
respect to its first argument, t. For the derivatives relative to the other two 
arguments 9 and rj, we use the following shortened notation: £g(t,9,rj) indi- 
cates the first derivative of £(t,9,rj) with respect to 9. Similarly, £ t: e{t,9,rj) 
denotes the derivative of £(t,9,rj) with respect to 9. Also, £t,t{9) and £t,e(jj) 
indicate the maps 9 \— > £(t, 9, rj) and r] \— * £te(t, 9, rj), respectively. For brevity, 
we denote £ = £(9 , 9 ,rj ), £ = £(9 , 9 ,rj ) and = £^ (0 O , 9 , rj ), where 
#o> i]q are the true values of and 77. Of course, we can write £{X) as -^o(^)- 
|| • || and || • || 2 indicate the Euclidean norm and L2 norm, respectively. The 
notations ^ and ^ mean greater than, or smaller than, up to a universal 
constant. The symbols P n and G n = y/n(F n — P) are used for the empirical 
distribution and the empirical processes of the observations, respectively. 

3.3. Main Assumptions. We now make the following three classes of as- 
sumptions: Rate assumptions (Rl) for the penalized nuisance parameter and 
the estimated penalty term; Smoothness assumptions (S1-S2) and Empirical 
processes assumptions (El) for £(t, 9, rj) and its related derivatives. 
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12 G. CHENG AND M. R. KOSOROK 

Rl : Assume: 

(9) d(rj SnM ,r]o) = Op(X n +\\e n -9o\\) 
and 

(10) x n j(fj,j = Op(x n + \\e n -9 \\). 



SI : The maps 



gl+r, 



(n) Mv) - we&'Mi) 

have integrable envelope functions in L\(P) in some neighborhood of 
(0 , 00, for (I, m) = (0, 0), (1, 0), (2, 0), (3, 0), (1, 1), (1, 2), (2, 1). 
S2 : Assume: 

(12) P£(9 , O , rj) ~ PWo, Vo) = 0(d( v , %)), 

(13) P£ t: e(e ,e , v )-P£ t: e(e ,e , V o) = o(d( v , Vo )), 

(14) pi(e ,e ,v) = o(d 2 ( v , m )), 

for all r\ in some neighborhood of r/o- 
El : For all random sequences n = 9 n + op(l) and 9 n = 9q + op(l), we 
have 

(15) G n (i(e ,9 ,V(j nM )-io) = Op(n^(A n + ||0 n -0 o ||)), 

(16) G n (£(9o,9n,Ve n ,xJ) = O p (1), 

(17) G n (£ t , e (9o,0n,Ve n ,J) = ° P (1), 

(18) (F n -P)£^(9 r J n ,f,^J = 0P (1). 

Assumption Rl implicitly assumes that we have a metric or topology de- 
fined on the set of possible values of the nuisance parameter r\. The form of 
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THE PENALIZED PROFILE SAMPLER 13 

d(rj, tjq) may vary for different situations and does not need to be specified in 
this subsection beyond the given conditions. Q implies that fjg x is consis- 
tent for 770 as 9 n — > 9q in probability. Additionally, from (jlOp we know that 
the smoothing parameter \ n plays a role in determining the complexity de- 
gree of the estimated nuisance parameter. (flO|) implies that J(f)\ n ) = Op(l) 
if the 9\ n is asymptotically normal, which has been shown in f|3T[) . Note that 
J(fj(j ) > J(jj§ \ )) where 770,0 = fje = argmax vG 'Hloglik(9,r]) for a fixed 
9, based on the inequality that loglik\ n (9 n ,fig n ) < log lik\ n (9 n , fjg ^ ). 

Clearly, the assumptions SI and S2 are separately the smoothness condi- 
tions for the Euclidean parameters (t, 9) and the infinite dimensional nui- 
sance parameter 77. The boundedness of the Frechet derivatives of the maps 
77 1 ► £(9o,9o,rj) and 77 1— ► £t t e(9o,9o,r)) ensures the validity of conditions 
(I12p and fjl3|) . Based on the discussions in section 2 of [3], under the given 
regularity conditions, it suffices to show ([14")) if the map r\ 1— > £(9o,9o,rf) is 
Frechet differentiable and the map 77 1— > lik(9o,ri) is second order Frechet 
differentiable. 

Condition (|15p is concerned with the asymptotic equicontinuity of the 
empirical process measure of £(9q,9q,t]) with 77 ranging around the neigh- 
borhood of 770- It suffices to show (fl6|) and (fTT|) if G n (£(#o, % a ) ~~ °) = 
op(l) and G n (£ ti0 (9 o ,9n,Ve nj x n ) ~ ^(^0, #o, »7o)) = op(l), provided 4 and 
£t,e(9o,9o,r]o) are square integrable. Thus we will be able to use technical 
tools T2 and T6 given in the appendix to show (|15p - (117p . For the verifica- 
tion of (I18p . we need to make use of a Glivenko-Cantelli theorem for classes 
of functions that change with n which is a modification of theorem 2.4.3 in 



211 ] and is explained in the appendix. 



imsart-aos ver. 2006/01/04 file: penalized.tex date: February 2, 2008 



14 G. CHENG AND M. R. KOSOROK 

In principle, assumptions SI, S2 and El on the functions of the least 
favorable submodel directly imply the following empirical no-bias conditions: 

(19) Fj(e ,e n ,rie n!Xn ) = Vj + Op(\ n + \\9 n -8 \\) 2 , 

(20) Fj(o ,o n ,% niXn ) = P£ + o P (x n + \\e n -e \\). 

The derivations of (I19p and (I20p are simply based on the regular Taylor 
expansions around the true values. The detailed arguments can be found in 

ri 

the proof of lemmas 1 and 2 in [3|] . The two empirical no-bias conditions en- 
sure that the penalized profile likelihood behaves like a penalized likelihood 
in the parametric model asymptotically and therefore yields a second order 
asymptotic expansion of the penalized profile log- likelihood. 

4. The Penalized Convergence Rate. In the previous section, we 
have imposed two assumptions about the convergence rates of the esti- 
mated nuisance parameter and the order of the estimated penalty term, i.e. 
([9]) and (fTUj) . To compute the convergence rates, we present three different 
theorems below which require different sets of conditions. These theorems 
can be viewed as extension of general results on M-estimators to penalized 
M-estimators, and are therefore of independent interest. We first state the 
classical definitions for the covering number (entropy number) and bracket- 
ing number (bracketing entropy number) for a class of functions. 

Definition: Let A be a subset of a (pseudo-) metric space (£, d) of real- 
valued functions. The 5-covering number N(5, A, d) of A is the smallest N 
for which there exist functions ai, .. . , ajv m £, such that for each a £ A, 
d(a,a,j) < 5 for some j £ {1, . . . , N}. The <5-bracketing number Nb(5, A, d) 
is the smallest N for which there exist pairs of functions {[a|',a^]}^ 1 C C, 
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THE PENALIZED PROFILE SAMPLER 15 

with d(aj,a,j) < 5, j = 1,...,N, such that for each a G A there is a 
j € {1, ... , iV} such that aJ- <a < a!j . The <5-entropy number (^-bracketing 
entropy number) is defined as H(6,A,d) = logN(S,A,d) (HB(S,A,d) = 
logN B (5,A,d)). 

Before we present the first theorem, define 

K = j W^- Mx) :{le _ eoll < ClAlrj _ riolloo <c 1 , J( V ) < oo} , 

for a known constant C\ < oo: 

Theorem 1. Assume conditions {HP, ([Hi, (ESP a^d (O* fre/ow /ioW 
/or every 6 £ n and n £ V n : 

(21) H B (e,lC,L 2 (P)) < e~y\ 

(22) Pe,r]/Pe,rio is bounded away from zero and infinity, 

(23) \\ie, v -ioh ~ ll^-^oll +d e (v,Vo), 

(24) P(4 >r? - 4 i7?0 ) < _d2(? ? ,r ?o ) + ||0-0 o || 2 . 

T/ien we have 

d e n ^e n ,\ n ^^) = °p( x n + Pn - foil), 

A - J (% n ,A n ) =Op(Xn + \\0n-Oo\\), 

for (9 

n '%„ A n ^ satisfying P(9 n £ Q n ,r)g^ , 

G V„) -> 1. 

Condition (|2ip determines the order of the increments of the empirical 
processes indexed by Iq^. A detailed discussion about how to compute the 
increments of the empirical processes can be found in chapter 5 of [ig| . 
Condition (|22p is equivalent to the condition that pg <r) is bounded away 
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from zero uniformly in x for (0,7?) ranging over G n x V n . Given that the 
distance function dg(r], rjo) in (|23|) is just \\po iT) — P0II2? (f23|) trivially holds 
provided that condition (|22p holds. For the verification of (|24p . we can do an 
analysis as follows. The natural Taylor expansions of the criterion function 
(0,77) 1— ► -P^r? around the maximum point (6*0,770) implies that P(£q i110 — 
^9o,vo) ~ ~ \\@ ~ #o|| 2 , and (g6|) implies that P(to lV - £q) < - Ky/Pe^j - 
y/p~o) 2 d(j, < —\\pe, v — Poll! gi yen condition (f22|) . 

We now apply theorem [1] to derive the related convergence rates in the 
partly linear model in corollary [TJ However, we need to strengthen our pre- 
vious assumptions to require the existence of a known M < 00 such that 
r\ G TCjf, where Hjf = Tik Pi {||f7||oo < Af} and that the density for the joint 
distribution (U, V, C) is strictly positive and finite. The additional assump- 
tions here guarantee condition (|22p . The following theorem [2] and theorem [3] 
can also be employed to derive the convergence rate of the non-penalized 
estimated nuisance parameter by setting A n to zero. However, we would 
need to assume that / G {g : \\g\\oo + J(g) < M} for some known M when 
applying these theorems. Thus we can argue that the the penalized method 
enables a relaxation of the assumptions needed for the nuisance parameter. 

Corollary 1. Under the above set-up for the partly linear normal 
model with current status data, we have, for 6 n = 9q + op(l), 



(25) 



114 



/0II2 



o P (\ n + \\e n -e \\) 



(26) 



o P (\ n + \\e n -e \\). 



Moreover, if we also assume that f £ {g : \\g\\oo + J (9) — M} for some 
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known M , then 

(27) ll4-/o||2 = O P (n- k ^ k +^ + \\9 n -e \\), 

provided condition ^ holds. 

Remark 2. Corollary [7] implies that the convergence rate of the es- 
timated nuisance parameter is slower than that of the the regular nuisance 
parameter by comparing i25\) and {21^ . This result is not surprising since 
the slower rate is the trade off for the smoother nuisance parameter esti- 
mator. However the advantage of the penalized profile sampler is that we 
can control the convergence rate by assigning the smoothing parameter with 
different rates. Corollary [TJ also indicates that \\f\ n — /0II2 = Op(\ n ) and 
\\fn~ /0II2 = Op(n~ k K k+2 ' > ). Note that the convergence rate of the maximum 
penalized likelihood estimator, Op(X n ), is deemed as the optimal rate in \2aj . 
Similar remarks also hold for corollary below. 

The boundedness condition (|22|) appears hard to achieve in some exam- 
ples. Hence we propose theorem [2] below to relax this condition by choosing 
the criterion function m$ irj = log[(p0 )J? + Pe,r\o)/^Pe,r\o\- Obviously, m$ )rj is 
trivially bounded away from zero. It is also bounded above for (6, rf) around 
the their true values if p$ r, (x) is bounded away from zero uniformly in x and 
Pq jV is bounded above. The first condition is satisfied if the map 8 1— > pg jT]0 (x) 
is continuous around #0 and po(x) is uniformly bounded away from zero. The 
second condition is trivially satisfied in the semiparametric logistic regres- 
sion model by the given form of the density. The boundedness of mg^ thus 
permits the application of lemma Q] below which is used to verify condition 
(|29p in the following theorem: 
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Theorem 2. Assume for any given 9 £ Q n , f]g satisfies P„rri^jj e > 
^n m 0,rio f or given measurable functions x \— * mg^x). Assume conditions 
[28\) and [2ty) below hold for every 9 £ n , every t)£ V„ and every e > 0: 

(28) P(mg jr , - mg m ) < - dg(r],rj ) + \\9 - 9 \\ 2 , 

(29) E* sup \&n{mg !V - mg !V0 )\ < (j> n (e). 

d&e n ,r)£V n ,\\8-&o\\<e,de{ri,ri )<e 

Suppose that A29\) is valid for functions <p n such that 5 i— > (j) n (5)/5 a is de- 
creasing for some a < 2 and sets n x V n such that P{9 G @ n , Vg 6 V n ) — > 1. 
Then dg(fjg,i]Q) < 0* p {5 n + \\9 — 9q\\) for any sequence of positive numbers 
5 n such that (p n (5 n ) < \fnb\ for every n. 

Lemma Q] below is presented to verify the modulus condition for the con- 
tinuity of the empirical process in (j29j) . Let 5,5 = {x i— > mg iTI (x) — mg^ (x) : 
dg(r],r]o) < 5, \\9 — 9q\\ < 5} and write 

(30) K(S, S s , L 2 (P)) = J *y/l + H B (e,S s , L 2 (P))de : 

Lemma 1. Suppose the functions (x,9,rj) i— ► mg tV (x) are uniformly 
bounded for (9,n) ranging over a neighborhood of (9o,rjo) and that 

P(mg :V - me ^ ) 2 £ dg(rj,r] ) + ||0 - 6> || 2 . 

Then condition \29\i is satisfied for any functions 4> n such that 

wW >^^,L 2 (P))(l + ^g^ (P)) ) 

Consequently, in the conclusion of the above theorem we may use K(5, 5^, L 2 (P)) 
rather than </>«(£) . 
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Iemark 3. Theorem\M and lemmaUl are theorem 3.2 and lemma 3.3 
in : I',/, respectively. We can apply theorem\^to the penalized semiparametric 
logistic regression model by including X in 9, i.e. = mQ V — ^X 2 (J 2 (r]) — 

J 2 (r/o))- This is accomplished in the following corollary. Note that we assume 
that the uniform norm and Sobolev norm of n are bounded above with known 
upper bounds when deriving [33]) of the corollary, but this assumption is not 
needed for and [32]) . 

Corollary 2. Under the above set-up for the semiparametric logistic 
regression model, we have for X n satisfying condition ([3p and any 9 n — > 9q 
that 

(31) \\m n ,x n -^h = o P (x n + \\9 n -e \\), 

(32) X nJ(f, §nXn ) = P (X n + \\9 n -9 \\). 

If we also assume that r/ £ {g : \\g\\oo + J(g) < M} for some known M, then 



(33) \\VL-Voh = P {n~ k '^ + 



>n — VO I 



Remark 4. Corollary[J\and\E imply that J{fj\ n ) = Op(l) and J(f\ n ) = 
Op(l), respectively. Thus the maximum likelihood estimators of the nuisance 
parameters in the two examples of this paper are consistent in the uniform 
norm, i.e. \\fj\ n - rj \\oo = o P (l) and \\fx n - /o||oo = o P (l), since the se- 
quences fj n and f n consist of smooth functions defined on a compact set with 
asymptotically bounded first-order derivatives. 

The preceding two theorems imply that the convergence rate of the penal- 
ized estimated nuisance parameter is affected by the assigned smoothness 
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parameter. However, the next theorem shows that, under different condi- 
tions, the above phenomena may not hold. Let 

tv,h = ^- t \t=ologlik Xn (6,r ]t ) = A e , v h- 2Xl J h^^dz, 
V(e,r,)h = PA e , v h, 
Y n (6,r])h = F n Ag !V h, 

where r\t = i] + th for h £ TL k and Ag^ is the appropriate score op- 
erator for the model. Note that r]t £ 7ik for sufficiently small t. Obvi- 
ously P n /~ n . =0 and V(9o,rjo)h = 0. We assume that the maps 
h i — ^ V(6,r])h and h Y n (9,r])h are uniformly bounded such that V n 
and V can be viewed as maps from the parameters set x 7i k into £°°(7ik). 
Further we require the following regularity conditions: For some C2 > 0, 

(34) {A e>r) h : ||6> - 9 \\ < C 2 , dg(rj, r] ) <C 2 ,h£ H k } is P-Donsker, 

(35) sup P(Ag v h — Ag no h) 2 — ► 0, as 9 — > 9q and r? — > 
heH k 

Theorem 3. Suppose that V(-,-) : G x H k i-> ^°°(W fc ) is Frechet 
differentiable at (9o,rjo) with derivative V(-, •) : M rf x linri k ^ £°°(Tik) such 
that the map V(0,-) : liriHk 1— > ^(TCk) is invertible with an inverse that is 
continuous on its range. Furthermore, we assume that (34\ ) and [35\) hold. 
Then 

( 36 ) d e n (Vg n>Xn ^o) = Opin- 1 ' 2 + \\9 n - O || + J 2 (^ A J), 
for 9 n — > 9q and f]§ x — ► ?7o w probability. 

Remark 5. The preceding theorem is a variation of theorems used 



in 



'1, 



] and 



'20] , among others, to prove the asymptotic normality of the 
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maximum likelihood estimator (9 n , fj n ) . If we can show that \ n J(i)Q A ) = 
Op{\ n +\\9 n — 6*o 1 1 ) by some other means, then [36\) implies that d^(f/^ x n ^ r lo) = 
P {n- 1 / 2 + || 9 n — 9q\\). This indicates that the smoothing effect of the pe- 
nalized method does not occur, which may be due to some very smooth non- 
penalized estimated nuisance parameter. The high degree of the smoothness 
of the non-penalized estimated nuisance parameter can be deduced from its 
fast convergence rate which equals the parametric rate in this instance. 

5. Main Results and Implications. In this section we first present 
second order asymptotic expansion of the log-profile penalized likelihood 
which prepare us for deriving the main results about the higher order struc- 
ture of the penalized profile sampler. The assumptions in section 3 and 
condition (J3j) are assumed throughout. 

Theorem 4. Given 9 n = 9 Xn + op(l), we have 

1 n 

(37) v^(0A„-0o) = ^J2 i o^o(X l )+Op(n 1 / 2 X 2 n ), 

v n i= i 

(38) log P l Xn (9 n ) = logpl Xn (9 Xn )-^(9 n -9 Xn ) T I (9 n -9 Xn ) 

+ P (g Xn (Pn-9 Xn \\)), 

where g Xn (w) = nw 3 + nw 2 X n + nwX^ + n 1//2 A 2 , provided the efficient infor- 
mation Iq is positive definite. 

Remark 6. The results in theorem [^] are useful in there own right 
for inference about 9. {37\j is a second higher order frequentist result in 
penalized semiparametric estimation regarding the asymptotic linearity of 
the maximum penalized likelihood estimator of 9. 
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We now state the main results on the penalized posterior profile distri- 
bution. A preliminary result, theorem [5] with corollary [3] below, shows that 
the penalized posterior profile distribution is asymptotically close enough to 
the distribution of a normal random variable with mean 6\ n and variance 
(n/o) -1 with second order accuracy, which is controlled by the smoothing pa- 
rameter. Similar conclusions also hold for the penalized posterior moments. 
Another main result, theorem [6J shows that the penalized posterior pro- 
file log-likelihood can be used to achieve second order accurate frequentist 
inference for 0. 

Let P%%. be the penalized posterior profile distribution of 9 with respect 
to the prior p(6). Define 

Theorem 5. Assume that 

(39) A Arl (0 n ) = op(1) implies 9 n = 6 + o P (l), 

for every random |$nj G ©■ If p(6q) > and p(-) has continuous and finite 
first order derivative in some neighborhood of 9q, then we have, for any 
— oo < £ < DO, 

(40) sup P^(^I o 1/2 (0 " Xn ) < - M0\ = Opin^Xl), 



where 3><i( - ) is the distribution of the d-dimensional standard normal random 
variable. 
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Corollary 3. Under the assumptions of theorem^ we have that if 
9 has finite second absolute moment, then 

(41) Xn = E^(9) + P (X 2 n ), 

(42) Jo = n^iVar^O))- 1 + 0p{n l l 2 X 2 n ), 

where ^^.(9) and Var^™^{9) are the penalized posterior profile mean and 
penalized posterior profile covariance matrix, respectively. 

We now present another second order asymptotic frequentist property of 
the penalized profile sampler in terms of quantiles. The a-th quantile of 
the penalized posterior profile distribution, r na , is defined as r na = inf{£ : 
P^l(0 < > a}. Without loss of generality, P^(9 < r nQ ) = a. We can 
also define K na = yjn{r na - 6\ n ), i.e., P^(y/n{9 - 6\ n ) < K na ) = a. 

Theorem 6. Under the assumptions of theorem^ and assuming that 
lo(X) has finite third moment with a nondegenerate distribution, then there 
exists a k na based on the data such that P(^/n(9\ n — 9q) < K na ) = a and 
Kna — i^na = 0p(7iV 2 A^) for each choice of K na . 

Remark 7. Theorem^ ensures that there exists a unique a-th quan- 
tile for 9 up to Op(X n ) in the frequentist set-up for each fixed r na . Note that 
T~na is not unique if the dimension of 9 is larger than one. 

Remark 8. Theorem^ corollary [3] and theorem above show that 
the penalized profile sampler generates second order asymptotic frequentist 
valid results in terms of distributions, moments and quantiles. Moreover, 
the second order accuracy of this procedure is controlled by the smoothing 
parameter. 
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Remark 9. Another interpretation for the role of X n in the penalized 
profile sampler is that we can view X n as the prior on J{rf), or on rj to some 
extent. To see this, we can write lik\ n {9,rf) in the following form: 

J 2 (r?) 



lik\ n (9,r)) = lik n (9,rj) x exp 
This idea can be traced back to 



2(i*r) 



2BJ. In other words, the prior on J {if) is a 
normal distribution with mean zero and variance (2A 2 ) -1 . Hence it is natural 
to expect \ n has some effect on the convergence rate of r]. Other possible 
priors on the functional parameter include Dirichlet and Gaussian processes 
which are more commonly used in nonparametric Bayesian methodology. 

6. Examples (Continued). We now illustrate verification of the as- 
sumptions in section 3.3 with the two example that were introduced in sec- 
tion 2. Thus this section is a continuation of the earlier examples. 

6.1. Partly Linear Normal Model with Current Status Data. We will con- 
centrate on the estimation of the regression coefficient 9, considering the 
infinite dimensional parameter / £ Ti^f as a nuisance parameter. The score 
function of 9, £$j, is given as follows: 

lej(x)=uQ(x;9J), 

where 

Q[X ;e, /) = (i- A) - aM|, 

qgj(x) = c — 9u — f(v), and (j) is the density of a standard normal random 

variable. The least favorable direction at the true parameter value is: 

, , s _ E (UQ 2 (X;9,f)\V = v) 
n ° [V) E mX;9J)\V = v) ' 
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where Eq is the expectation relative to the true parameters. The derivation 
of £g f and ho{-) is given in Thus, the least favorable submodel can be 
constructed as follows: 

(43) £(t,9,f)=loglik(t,f t (9J)), 

where ft(9,f) = f + (0 — t)ho. By differentiating (l43|) with respect to 
t or 6, we can obtain the maps assessed in assumption SI, (t,6,f) ^— > 
(d l+m /dt l d9 m )£(t,9,f). The concrete forms of these maps are given in jjj] 
which considers a more rigid model with a known upper bound on the L2 
norm of the kth. derivative. The rate assumptions (jHJ) and (jlOjl have been 
verified previously in corollary [TJ The remaining assumptions are verified in 
the following two lemmas: 

Lemma 2. Under the above set-up for the partly linear normal model 
with current status data, assumptions SI, S2 and El are satisfied. 

Lemma 3. Under the above set-up for the partly linear normal model 
with current status data, condition is satisfied. 

6.2. Semiparametric Logistic Regression. In the semiparametric logistic 
regression model, we can obtain the score function for 9 and r\ by similar 
analysis performed in the first example, i.e. Iq^{x) = (y — F(9w + rj(z)))w 
and AQ !V he !V (x) = (y — F(9w + r](z)))hg tV (z) for J(h) < oo. And the least 



favorable direction at the true parameter is given in 

P [WF(9 W + r ]0 (Z))\Z = 



h (z) 



P [F(9 W + Vo (Z))\Z = z] ' 
where F(u) = F(u)(l — F(u)). The above assumptions plus the requirement 
that J (ho) < oo ensures the identifiability of the parameters. Thus the least 
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favorable submodel can be written as: 



e(t,e,rj) = log uk(t, m (0,v)), 

where T]t(0,rj) = rj + (0 — t)h,Q. By differentiating l(t,8,rj) with respect to t 
or 9, we obtain, 



£(t,6 


,v) = 


(y-F(tw + r)(z) + { 


9-t)h (z)))(w-h (z)), 


£(t, 6 


,T}) = 


-F(tw + r)(z) + (9 - 


t)h {z))(w-h {z)) 2 , 


MM 


,v) = 


-F(tw + r)(z) + (9 - 


t)ha(z))(w - h (z))h (z), 




,v) = 


-F(tw + r)(z) + (9 - 


t)h {z)){w-h (z)f 1 




,v) = 


-F(tw + rj(z) + (9 - 


t)h {z)){w-h (z)) 2 h (z), 


t,e,e(t, 6 


,v) = 


-F(tw + r)(z) + (9 - 


t)h {z)){w-h (z))h 2 {z), 



where F(-) is the second derivative of the function F(-). The rate assump- 
tions have been shown in corollary [2j The remaining assumptions are verified 
in the following two lemmas: 

Lemma 4. Under the above set-up for the semiparametric logistic re- 
gression model, assumptions SI, S2 and El are satisfied. 

Lemma 5. Under the above set-up for the semiparametric logistic re- 
gression model, condition i fggj) is satisfied. 

7. Future Work. Our paper evaluates the penalized profile sampler 
method from the frequentist view and discusses the effect of the smoothing 
parameter on estimation accuracy. One potential problem of interest is how 
to select a proper smoothing parameter in applications. A formal study 
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about the higher order comparisons between the profile sampler procedure 



and fully Bayesian procedure lq j. which assign priors to both the finite 
dimensional parameter and the infinite dimensional nuisance parameter, is 
also interesting. We expect that the involvement of a suitable prior on the 
infinite dimensional parameter would at least not decrease the estimation 
accuracy of the parameter of interest. 

Another worthwhile avenue of research is to develop analogs of the profile 
sampler and penalized profile sampler to likelihood estimation under model 
misspecification and to general M-estimation. Some first order results for 
this setting in the case where the nuisance parameter may not be root-n 
consistent have been developed for a weighted bootstrap procedure in lo| . 

8. Appendix. We first present some technical tools about the entropy 
calculations and increments of empirical processes which will be employed 
in the proofs that follow. 

Tl. For each < C < oo and 5 > we have 

(44) H B (5, { V : IMIoc < C, J(ry) < C}, \\ ■ \U < 

(45) H(S, { V : IMU < C, J( V ) < C}, || • |U) < {j) l ' k . 

T2. Let T be a class of measurable functions such that Pf 2 < 5 2 and 
< M for every / in T . Then 

E* P \\Q n \\ T < K(5,F,L 2 (P)) (l + ^|^M 



where K(5,T, \\ ■ \\) = / ° y/1 + H B (e,T,\\ ■ \\)de. 

T3. Let T = {ft : t £ T} be a class of functions satisfying \ f s (x) — ft(x)\ < 
d(s, t)F(x) for every s and t and some fixed function F. Then, for any norm 
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JV n (2e||F||,.F,|| ■ ||) <N(e,T,d). 

T4. 

(46) - P 9o log ^- > f(Vp~e- VP^) 2 dfi. 

T5. Let P be a class of measurable functions /:BxW^Rona 
product of a finite set and an arbitrary measurable space (W, W). Let P be 
a probability measure oeDxW and let Pyy be its marginal on W. For every 
d G D, let Td be the set of functions w i-> f(d,w) as / ranges over J 7 . If 
every class jF rf is P-Donsker with supj g jc \ Pf(d, W)\ < oo for every d, then 
P is P-Donsker. 

T6. Let P be a uniformly bounded class of measurable functions such 
that for some measurable /o, supj e jr ||/ — /o||oo < 00 • Moreover, assume 
that H B (e,J r ,L 2 (P)) < Ke~ a for some K < 00 and a £ (0,2) and for all 
e > 0. Then 

|(P n -P)(/-/ )| 



sup 



P (n 



-1/2. 



JI/./oll^"/ 2 Vn (-2)/[2(2+«)] 
T7. For a probability measure P, let Pi be a class of measurable functions 
fi : X ^ R, and let P2 denote a class of nondecr easing functions /2 : R 1— ► 
[0, 1] that are measurable for every probability measure. Then, 

H B (e,f 2 {fi),L 2 (P)) < 2H B (e/3,F u L 2 (P))+supH B (e/3,F 2 ,L 2 (Q)). 

Q 

T8. Let P and £ be classes of measurable functions. Then for any prob- 
ability measure Q and any 1 < r < 00, 

(47) H B (2e, T + g,L r (Q))< H B (e, P, L r (Q)) + H B (e, Q, L r (Q)), 

imsart-aos ver. 2006/01/04 file: penalized.tex date: February 2, 2008 



THE PENALIZED PROFILE SAMPLER 29 
and, provided T and Q are bounded by 1, 

(48) H B (2e,F X G,L r (Q)) < H B (e,F,L r (Q)) + H B (e,g,L r (Q)). 

Remark 10. The proof of Tl is found in fjj/- Tl implies that the 
Sobolev class of functions with known bounded Sobolev norm is P-Donsker. 
T2 and T3 are separately lemma 3.4-2 and theorem 2.7.11 in [21]. in 
T4 relates the Kullback-Leibler divergence and Hellinger distance. Its proof 
depends on the inequality that log x < 2{y/x—l) for every x > 0. T5 is lemma 
9.2 in lla J. T6 is a result presented on page 79 of 11 w and is a special case 
of lemma 5.13 on the same page, the proof of which can be found in pages 
79-80. T7 and T8 are separately lemma 15.2 and 9.24 * n UJ- 



Proof of theorem [I]: The definition of fj^ A implies that 



+ P 



< >? n J 2 { m ) + I+II. 



Note that by T6 and assumption (|2ip . we have 



/ < (l + J(f)e n) J)0 P (n-V 2 )x 

+(1 + J{ m ))0 P (n- l l 2 ) x 
By assumption (|24|) . we have 



1 + J ti9njJ 

e n ,y ~ ^° 



1 + J(vo) 



ii < -dl(v §nM , m ) + \\e n -9 \\ 2 . 



V n 2 ( 2fc + 1 ) > . 
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Combining with the above, we can deduce that 
d\ + >?Jl ~ (1 + Jn)Op(n" 1/2 ) x 

+ (1 + J )O P (n- 1 / 2 ) x 

(49) + A 2 J 2 + |K,-#o|| 2 , 

where d n = d§ (fjg ^ ,r?o)j J(,f]o) = Jo and J n = J(fj§ x ). The above in- 
equality follows from assumption (|23p . Combining all of the above inequali- 
ties, we can deduce that 

(50) u\ = Op{l)+Op{l)Un^, 

(51K = v^Opipn - 6 \\ 2 ) + u l n^O P (\ n ) + P (n- 1 A" 1 1| B n - floll 1 ^), 

where u n = (d n + \\0 n - 0q\\)/ (A n + X n J n ) and v n = X n J n + A„. The equation 
(j50|) implies that u n = Op(l). Inserting u n = Op (I) into (f5Tj) . we can know 
that v n = Op{X n + \\9 n — 0o ID) which implies u n has the desired order. This 
completes the whole proof. □ 

Proof of corollary Q]; Conditions f|22 [ ) — (|24[) can be verified easily in this 
example based on the arguments in theorem Q] because igj has finite second 
moment, and pgj is bounded away from zero and infinity uniformly for 
(9, /) ranging over the whole parameter space. Note that dg(f, /o) = \\pej — 
PolU ~ Wloj — Qe ,f 0W2 by Taylor expansion. Then by the assumption that 
EVar(U\V) is positive definite, we know that \\q^ t —Qe Jo II2 = Op(X n + 
\\6 n -9 \\) implies ||/g ni A n - /o||2 = P (X n + \\9 n - 9 \\). Thus we only need to 
show that the e-bracketing entropy number of the function class O defined 



d n + \\9 n — 9q 

, 1 + Jn 



\J n 2(2fc+l) 



\9 n — 9q\ 
1 + Jo 



1-7 
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below is of order e _1//fc to complete the proof of (|25]) - (|26|) : 

= { l + jlf) : ^ ~ °^ ~ Cl > M ~ /o||o ° " Cl ' J(/) < °°} ' 

for some constant C\. Note that £gj(X)/(l + </(/)) can be rewritten as: 

(52) AA" 1 log $ (qgjA) + (1 - A)^ 1 log (1 - $ (g^)) , 
where ^4 = 1 + J(f) and G 0i, where 

01 = { f^j^ : H g ~ II < Ci, 11/ ~ /olloc < Ci, ■/(/) < ooj , 

and where we know H B (e,0 1 ,L 2 (P)) £ e" 1 ^ by Tl. 

We next calculate the e-bracketing entropy number with L 2 norm for the 
class of functions R\ = {k a (t) : t i— » a -1 log &(at) for a > 1 and t G R}. By 
some analysis we know that k a (t) is strictly decreasing in a for t G R, and 
sup tgR \k a (t) — kb(t)\ ^ |<x — b| because \d / da(k a (t))\ is bounded uniformly 
over t G R. In addition, we know that sup a b>A teK \k a (t) — kb(t)\ £ Aq 1 
because the function u i— > ulog Q(u~ l t) has bounded derivative for < u < 1 
uniformly over The above two inequalities imply that the e-bracketing 

number with uniform norm is of order 0(e~ 2 ) for a G [l,e -1 ] and is 1 for 
a > e _1 . Thus we know Hs(e, R\, L 2 ) = 0(loge -2 ). By applying a similar 
analysis to R 2 = {k a (t) : t i— ► a _1 log(l — <&(ai)) for a > 1 and t G R}, we 
obtain that Hb(€, R 2 , L 2 ) = 0(log e~ 2 ). Combining this with T7 and T8, we 
deduce that H B (e, O, L 2 ) < e" 1 ^. This completes the proof of (I2"5D-(I2"UD. 

For the proof of (|27p . we apply arguments similar to those used in the 
proof of theorem [1] but after setting A n , Jo and J n to zero in (|49p . Then 
we obtain the following equality: d 2 = Op{n~ 2k ^ 2k+1 ">) + \\6 n — 6> || 2 + 
Op(ra- 1/2 )||0n - floll 1 - 172 " + Op(n- 1 /2)(||0 n _ fl || + 4)i-i/2fe. By treating 
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\\0 n -0 \\ < n - k / (2k+1 ^ and \\9 n - \\ > rT k ^ 2k+ ^ differently in the above 
equality, we obtain ([27"|) .D 

Proof of corollary Lemma 7.1 in [141 ] establishes that 



( 53 ) •».„„ • 
after choosing 



P6o,vo 



2 + XnJ{V§ n:X J = °P( X n + Pn ~ 9q\ 



mexv = log^±^ - \\\J\v) ~ J\vo)) 



in theorem[2j Note that the map ^ pg vo / ' f w > z (w, z) is uniformly bounded 
away from zero at = Oq and continuous around a neighborhood of 9q. Hence 
fne,\,r) is well defined. Moreover, P n w-e,A,% x > ^Vi^^a^o by the inequality 
that ((p dtV +p e , r)o )/2p0 tVo ) 2 > (p8, v /pe, Vo )- (153} now directly implies fl32]). For 
the proof of (|31|) . we need to consider the conclusion of lemma 7.4 (i), which 
states that 

(54) \\pe, v - Pe , Vo h ~ (||0-0oll A1 + \\\v~Vo\ A 1|| 2 ) A 1. 

Thus we have proved (I3"T1) . For (1551) . we just replace the vtlq \ „ with mg n 

n . . 

in the proof of lemma 7.1 in [14 ] . Thus we can show that dg(r), r/o) = r; — 
Pe ,rio\\2- By combining lemma [1] and (f54"|) . we know that \\fjg — ??o 1 1 2 = 
P (5 n + \\6 n - \\), for 5 n satisfying K(5 n ,S s „, L 2 (P)) < \fnb 2 n . Note that 
K(8,Ss, L,2(P)) is as defined in ([30]) . By similar analysis as used in the proof 
of lemma 7.1 in [14] and the strengthened assumption on 77, we then find 
that K(5 n ,S$ n , L,2(P)) iS 5n 1 ^ 2fe , which leads to the desired convergence 
rate given in (|33p . □ 
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Proof of theorem 0. Note that 

D/An _ p/A 7l 

§n,Vg X ,h ^%,VO,h 

= v (dn,f)e n ,xJh-F n l> : -2X1 J htotf*^ -4 k) )dz 
= -(V n - V)(e n ,f)* x )h + 2X 2 n I h^ ] dz 



-(Y n -V)(9 ,r ]0 )h + o* P (n- 1/2 ) + 2X 2 n J h^^dz 
P (n' 1 / 2 ) + 2X 2 n [ h^^dz. 



The last two equalities in the above follow from assumptions (|34p and (|35p. 
The Frechet differentiability of V(-, •) at (fo,??o) establishes that 



S n ,fl s x ,h ^%,Vo,h 

= V{8 n - fo,% njAn - r] ) + o*p(\\6 n - fo|| + d §n (r)e n ,\ n >Vo)) 
-2XlJh^H^ Xn -rj^)dz. 

Combining the above two sets of equations, we have, by the linearity of 
V(-,-), established that 

7(0, % n;A J = P (n- 1 ' 2 ) + Op(\\6 n - + 2X1 J z h^tlxj 2 - 

Now by the invertibility of V(0,-), we can deduce that dg n (fjg n Xn , r]o) = 
P (n-Va + ||0 n - fo|| + A* J 2 (% njA J). □ 

Proof of theorem [^j We first show (|37p , and then we need to state one 
lemma before proceeding to the proof of fj38[) . For the proof of (137h . note 
that 

= Fj(§ Xn , § Xn , f! Xn ) + 2A* jf Tyfj (z)h { k) (z)dz. 

Combining the third order Taylor expansion of 0\ n \— ► F n £(8\ n ,0,r]) around 
fo, where 6 = 9\ n and r\ = fj\ n , with conditions (fl9j) and ([20]) . the first term 
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in the right-hand-side of the above displayed equality equals P n ^o — Io(6\„ ~ 
Oq) + P (X n + \\§ Xn - 9 \\) 2 . By the inequality 2X 2 n J z fj^ (z)h^ (z)dz < 
A„(J (rjx n ) + J 2 {ho)) an< i assumption (fTUj) . the second term in the right- 
hand-side of the above equality is equal to Op(X n + \\0\ n — #oll) 2 - Combining 
everything, we obtain the following: 

(55) -^jri^UXi) = V^0x n - 9 ) + Op(n 1 /2( An + ||0 An - 9 \\) 2 ). 
v n i= i 

The right-hand-side of (f55j) is of the order Op(y/nX 2 l + y/nw n (l + w n + A n )), 
where w n represents \\0\ n — 9q\\. However, its left-hand-side is trivially Op{l). 
Considering the fact that \fnX 2 n = op(l), we can deduce that 9 Xn — 9 = 
Op(n -1 / 2 ). Inserting this into the previous display completes the proof of 

(EZD. 

We next prove (|55|) . Note that 9 Xn — 9q = Op{n~ 1 / 2 ). Hence the order of 
the remainder terms in ([19]) and ([20]) becomes Op(X n + \\9 n — 9 Xn \\) 2 and 
Op{X n + \\9 n — 9\ n \\), respectively. Expression (f61~j) in lemma[6]below implies 
that 

(56) logpl Xn (9 Xn ) = logpl Xn (9 ) + n(9 Xn -9 ) T ¥j 

- ^{0x n - 0oflo(0x n ~ Bo) + P {n^ 2 X 2 n ). 

The difference between ()56[) and (|6ip generates 

logp/A„(4) = \ogpl Xn {h n ) + n(9 n - § Xn f (fJ - I (9 Xn - 9 )) 
- \ (On ~ h n ) T Wn - 9 Xn ) + P {g Xn (\\9 n - 9\ n \\)). 

(f38j) is now immediately obtained after considering (|37|) . □ 

Proof of theorem^ Suppose that F Xn (-) is the penalized posterior profile 
distribution of yJnQ n with respect to the prior p(9), where the vector g n 
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~i/2 

is defined as I (9 — 6 n ). The parameter set for g n is E n . F Xn [-) can be 
expressed as: 

J en es n P^Xn + h On) ^JeZ) d Sn 
Note that dg n in the above is the short notation for dg n \ x . . . x dg nc [. To 
prove theorem we first partition the parameter set S n as {S n n {||^ n ||2 > 
r n}} U {H n n {||^n||2 < r n}}- By choosing the proper order of r n , we find 
the posterior mass in the first partition region is of arbitrarily small order, 
as verified in lemma [5jl immediately below, and the mass inside the second 
partition region can be approximated by a stochastic polynomial in powers of 
n~ 1//2 with error of order dependent on the smoothing parameter, as verified 
in lemma 02 below. This basic technique applies to both the denominator 
and the numerator, yielding the quotient series, which gives the desired 
result. 

lemma\^l. Choose r n = o(n -1 / 3 ) and \frir n — > oo. Under the conditions 
of theorem [5j we have 

i 

(58) / P {k n +I hn) PlxA9 " n ±X^- dg n = P (n~ M ), 

J\\en\\>r» pl\ n {V\J 

for any positive number M. 

Proof: Fix r > 0. We then have 



i 



P\ u \n + I o Qn) : — 7~ — ag r , 

Bn\\>r Ph n {V\J 



< I{A r Xn < -n-5}exp(-^) f p(0)d6 + I{A r Xn > -n"*}, 



where A^ n = supy n \\ >r A Xn (d\ n + £„/ 1/2 ). Then by lemma 3.2 in Q], 
I{A r Xn > — n^a} = Op(n~ M ) for any fixed r > 0. This implies that there 
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exists a positive decreasing sequence r n = o(ra -1 / 3 ) with y/nr n — > oo such 
that ([581) holds. □ 

lemma\^2. Choose r n = o(n~ 1//3 ) and \fnr n — > oo. Under the conditions 
of theorem [5l we have 



P l X n (°\n + J 2 Sri) ,A . j~\ n ( n T X n 

-P( X„ + ^0 0n) - ex P ( -^QnQn ) P{Q\ n , 



'llenll^n 

(59) 



Pk»(0) 



Xdg n = Op(Xl). 



Proof: The posterior mass over the region \\g n \\2 < r n is bounded by 

i 

P l \ n { e \ n + h 2 Qn) 



Qn \ \2<r„ 



+ 



\\Qn\\2<r n 



1 



-p(o> 



exp ( ~-QnQn ) p(G\n 



ph n {0 Xn +l o 2 Q n ) ,a i r \ \ phJhn±Jo 2 6n) ,& 
■P{V\n + 7 o On) ; — -*— P{V\ n 



Ph n ( e \n) 



dg n (*) 
dg r , 



By (|38|) . we obtain 
(*) = / 



p(0 A Jexp (-1^1 |exp(0 P ( 5An (||^| 



Obviously the order of (*) depends on that of | exp(Op(g\ n (\\g n \\))) — 1| 
for X n satisfying (J3J) and \\g n \\ < r n . In order to analyze its order, we par- 
tition the set {A„ = op(n~ 1 / 4 ) and A" 1 = P (n k l {2k+1 ">)} with the set 
{A n = P (n-V 3 )} : i.e. U n = {X n = (^(n" 1 /*) and A" 1 = P (n fc /( 2fc+1 ))} n 
{A n = Opin- 1 / 3 )} and L n = {X n = o P ( n - l l A ) and A" 1 = P (n fc /( 2fc+1 ))} n 
{A n = P {n- l / 3 )} c . For the set U n , we have \exp(0 P (gx n (\\g n \\))) ~ 1| = 
ffAndknll) xOp(l). For the set L n , we have Op(g\ n (\\g n \\)) = P (n\\g n \\X 2 n + 



n 



1/2 A 2 ). We can take 



-l-<5 \-2 



A n for some 5 > such that y^V, 



oo 



and r n = oin' 1 / 3 ). Then | exp(0 P ( 5An (||^||))) - 1| = (n||^||A 2 + n x / 2 A 2 ) x 
Op(l). Combining with the above, we know that (*) = Op(A 2 ). By similar 
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analysis, we can also show that (**) has the same order. This completes the 
proof of lemma [5j 2. □ 

We next start the formal proof of theorem[5j By considering both lemma[5jl 
and lemma [5j 2, we know the denominator of (|57p equals 

dg n + Op(\l). 



exp [~zQ n Qn] p(PXn) 
'{||0n||2<r„}nE„ L V z 

The first term in the above display equals 

n- 1/2 p{k n ) I _ e-<^' 2 du n = n- l / 2 p(6 Xn ) [ e~ u ^ 2 du n 

+ o(x 2 n ), 

where u n = \fng n . The above equality follows from the inequality that 
fx° e ~ y2 ^ 2 dy < x~ l e~ x2 / 2 for any x > 0. Consolidating the above analyses, 
we deduce that the denominator of ([57]) equals n~zp(6\ n ){2ir) d l 2 + Op(X 2 l ). 
The same analysis also applies to the numerator, thus completing the whole 
proof. □ 

Proof of corollary^- We only show (|41[) in what follows. (|42p can be veri- 
fied similarly. Showing (|4"T|) is equivalent to establishing E^ n x (g n ) = Op(X 2 x ). 
Note that Eg^ x (g n ) can be written as: 



By analysis similar to that applied in the proof of theorem [5] we know 
the denominator in the above display is nT 1 ! 2 (2ir) d ^ 2 p{9 \ n ) + Op(X 2 l ) and 
the numerator is a random vector of order Op(n~ 1 ^ 2 X 2 l ). This yields the 
conclusion. □ 
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Proof of theorem® Note that (HOj) implies K na = I l ^ 2 z a + Op(n 1 / 2 A,^), 
for any £ < a < 1 — £, where £ £ (0, Note also that the a-th quantile 
of a cZ dimensional standard normal distribution, z a , is not unique if d > 1. 
The classical Edgeworth expansion implies that i^n" 1 ^ £™ =1 J- 1/2 iopr f ) < 
-^a + ci n (a)) = q, where a n (a) = 0(n -1 / 2 ), for £ < a < 1 — £. Note that 
a n (a) is uniquely determined for each fixed z a since £o(^Q) has at least one 

~— 1/2 

absolutely continuous component. Let £ nQ , = I Q z a + (\/n(0A n — #o) ~~ 
n- 1 / 2 Er=i^ 1 4(X i )) + I ~ 1/2 «n(a)- Then P(V^(0 An - 9 Q ) < k na ) = a. 
Combining with (f37|) . we obtain k na = K na + Op(n 1 / 2 A 2 i ). The uniqueness 
of ^na up to order Op(n 1 / 2 A 2 ) follows from that of a n (a) for each chosen 
z a .U 

Proof of lemma® Assumptions SI and S2 are verified in lemma 5 of 
For the verifications of the assumption El, we first show the asymptotic 
equicontinuity condition (fT5|) . Without loss of generality, we assume that A n 
is bounded below by a multiple of n - fc /( 2fc + 1 ) an d bounded above by n -1 / 4 
in view of Thus 

e(.6 ,e j § x )-i \ \\f § A n -/o||i 



p 



Op (n 2fc+i^j 



n—*(K + \K - 9 \\) J n—i(\ n + \\6 n - O ||) 2 
where ()25[) implies the equality in the above expression. 

By ([25D, we know that J(/^ A J = P (1 + ||0„ - O II An) and \\f§ n<x Joo 

it 

follows: 



is bounded by some constant, since / E 7i¥ . We then define the set Q n as 



fr A ' /M ° : J(/) < C n (l + tM), H/IU < M, \\9 — 9 \\ < S] 
n—2{\ n + \\6-9 \\) A ™ J 



n 



{g e L 2 (P) : Pg 2 < C n rT^} 



imsart-aos ver. 2006/01/04 file: penalized.tex date: February 2, 2008 



THE PENALIZED PROFILE SAMPLER 39 

for some 5 > 0. Obviously the function rr 1 /^ 2 )^, , f§ nM )-lo)/(K.+ 
\\&n — II)) ^ Qn on a set of probability arbitrarily close to one, as C n — * oo. 
If we can show lim n _ >00 i?*||<G n ||Q n < oo by T2, then assumption (fT5|) is 
verified. Note that £(9o,8o,f) depends on / in a Lipschitz manner. Conse- 
quently we can bound Hs(e, Q n , L,2(P)) by the product of some constant 
and H(e,TZ n , ^(-P)) in view of T3. lZ n is defined as 

{H n (f) : J(H n {f)) < K 1 *- 1 * 41 *** >\\Hn{j)U *> K l n- im+2) ), 

where H n (f) = f / (n l ^ Ak+2 \\ n + \\9 - 6 \\)). By fl, we know that 

H(e,TZ n ,L 2 (P)) < (A^nTMVe) 1 ^. 

Note that <5 n = n -1 /* 4 ** 2 ) and M n = n (2fc-l)/(4fc+2) in X2 Xhus by calcu _ 
lation we know that if (5 n , Q n ,L 2 (P)) £ >^ 1/2k nr 1 ^ 4k+i h Then by T2 we 
can show that linin^oo £'*||G n ||Q n < oo. 

We next show (fl~8|) . It suffices to verify that the sequence of classes of 
functions V n is P-Glivenko-Cantelli, where V n = {(S^> (9 n ,9 n , fg ^ )(%)}, for 
every random sequence 9 n — ► 9q and 9 n — > #o m probability. A Glivenko- 
Cantelli theorem for classes of functions that change with n is needed. By 
revising theorem 2.4.3 in [2l| with minor notational changes, we obtain 
the following suitable extension of the uniform entropy Glivenko-Cantelli 
theorem: Let J- n be suitably measurable classes of functions with uniformly 
integrable functions and H(e,J : n ,Li(F n )) = o* P (n) for any e > 0. Then 
||P n — P\\p n ~^ i n probability for every e > 0. We then apply this revised 
theorem to the set J- n of functions £^(t,9, f) with t and 9 ranging over 
a neighborhood of 9$ and X n J(f) bounded by a constant. By the form of 



imsart-aos ver. 2006/01/04 file: penalized.tex date: February 2, 2008 



40 G. CHENG AND M. R. KOSOROK 

£( 3 >(t, 9, /), the entropy number for V n is equal to that of 

= {cf>(q tM e,f)(x))R(q tMej} (x)) : (t,9) G ^ ,A n J(/) < C, ||/||oo < Af}. 



By arguments similar to those used in lemma 7.2 of 14], we know that 



supg H(e, P n , Li(Q)) ^ (1 + A f ^ 1 /e) 1 / fe = op(n). Moreover, the T n are uni- 
formly bounded since / G Ti-jf ■ Considering the fact that the probability 
that V n is contained in T n tends to 1, we have completed the proof of (| 18[) . 

For the proof of (fTBj) . we only need to show that G n (£(9o, 6 n , x ^)—£q) = 
op(l) since £o{x) is uniformly bounded in x. Note that we only need to show 
(fl"6D holds for 

&n — @n + o(n based on the arguments in lemma [5l 2, We 
next show that G n (I(9 , 9 n , f §n X J - £ ) = o P (l + n 1/3 ||#„ - 6 \\) = o P (l). 
By the rate assumptions Rl, we have 

\l + nW\\9 n -H J ~ (l + nV3||^- 0o ||)2 ^ 
We next define Q n as follows: 

{M^TM : J(/) 5 c " (1 + ^ "'- £ M ' 119 - e °» < 4 } 

n{ 9 eL 2 (P):P ff 2 <^}. 

Obviously the function (l(0 o , n , fg n>Xn ) ~ *o)/(l + ™ 1/3 ||#n - #o||) G Q„ on 
a set of probability arbitrarily close to one, as C n — ► 00. If we can show 
lim n ,_, 00 -E*||G n ||g n — > by T2, then the proof of (|16p is completed. Accord- 
ingly, note that £(9q,9,J) depends on (#,/) in a Lipschitz manner. Conse- 
quently we can bound Hs(e, Q n , ^(P)) by the product of some constant 
and (H(e,il n , L/2(P)) +log(l/e)) in view of T3. TZ n is defined as 

{H n (f) : J(H n (f)) < 1 + (n 1 / 3 ^)- 1 , [|#n(/)[U £ 1 + (n 1 ^)" 1 }, 
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41 



where H n {f) = //(l + n 1 / 3 ^ - 6» || ) - By we know that 



H(e,K n ,L 2 (P)) < ((l + n-^X-^/ef/K 



Then by analysis similar to that used in the proof of (115D , we can show that 
linin^oo £/*||G n ||g — > in view of T2. This completes the proof of f)16[) . 

For the proof of (fT7|) . it suffices to show that G n (^t,e(do, &ni f§ \ ) — 
4,0 (#o> ^0; /o)) = °p(1) f° r 6n = 6n + o(n -1 / 3 ) and for # n between # n and 
#0) in view of lemma 02. Then we can show that G n (£t ! e(8o, m fg A ) — 
^t,e{Goi ^0; /o)) = op(l + n 1 / 3 !!^ — 9q\\) = op(l) by similar analysis as used 
in the proof of (|16() ,D 

Proof of lemma\^ By the assumption that A\ n (6 n ) = op(l), we have 
A\ n (6 n ) — A\ n (9o) > op(l). Thus the following inequality holds: 

uk(e n J §nXn ,Xi) 



n 



i=l 



lik{9 , fe ,x n ,Xi 



n- L K[J 2 (fe n ,xJ ~ J 2 (fe ,xJ] > o P (l) 



By considering assumption (jlOp . the above inequality simplifies to 

H(6 n ,fs x ;Xi 



n 



-i 



> o P (l), 



i=i H(9 ,fe ,x n ;Xi 

where #(0, /; X) = A$(C - 0*7 - /(F)) + (1 - A)(l - *(C - 0*7 - /(F))). 
By arguments similar to those used in lemma [212 and by T5, we know 
H(6 n , f§ A ; Xj) belongs to some P-Donsker class. Combining the above 
conclusion and the inequality a log x < log(l + a{x — 1}) for some a £ (0, 1) 
and any a? > 0, we can show that 



(60) 



Plog 



, H(6 n ,fs \ ;Xi 
1 + a I =^ 1 



> o P (l). 



H{9q, fe ,\ n ;Xi 
The remainder of the proof follows the proof of lemma 6 in 



.□ 
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Proof of lemma\J^ The maps (llip are uniformly bounded since F(-), F(-) 
and F(-) are all uniformly bounded in (—00, +00). This completes the verifi- 
cations of SI. Note that (W, Z) are in [0, l] 2 and ho(-) is intrinsically bounded 
over [0, 1]. Hence we can show that the Frechet derivatives of r/ 1— > i?(#0) $0j ??) 
and rj 1— > £t,e{9o, 9q, v) f°r an y ^ £ ^fc are bounded operators, from which we 
can deduce that \£(9q, 9q, r])(X) — lo(X)\ is bounded by the product of some 
integrable function and \r] — t]q\(Z). This ensures (112p and (1131) , For (|14p . 
P£(0q, 9o, rj) can be written as P(F(9qw + t]q) — F(6qw + r](z)))(w — ho(z)) 
since P£ = 0. Note that P(w - h (z))F(9 w + rj (z))(r] - r] )(z) = 0. This 
implies that P£(9 ,9 ,rj) = P(F(9 w + 770) - F(9 w + v(z)) + F(9 w + 
r]o(z))(ri — 7]o)(z))(w — ho(z)). However, by the common Taylor expansion, 
we have \F(6 w + rj)- F(9 w + r/ ) - F(9 w + rjo)(rj- r? ) | < \\F\\oo\r] - r] \ 2 . 
This proves (I14j) . 

We next verify assumption El. For the asymptotic equicontinuity condi- 
tion (|15p . we first apply analysis similar to that used in the proof of lemma [2] 
to obtain 



By lemma 7.1 in [14], we know that ./(r/gi A ) = Op(l + ||#n — ^0 11/ -^n.) and 
||% \ 1 1 00 is bounded in probability by a multiple of J(fj§ A ) + 1. Now we 
construct the set Q n as follows: 

1 1 : AW < <^n(l + : ), hWoo < C„(l + J(r])), 

{n—2{\ n + \\6-9 \\) X n 

\\0 - 9 \\ < 6} n {g G L 2 (P) : Pg 2 < C n n-^) . 

Clearly the probability that the function n~ l /^ k+2 \£{9 Q , O , fj^ x J-io)/(X n + 
\\9 n —9o\\)) G Qn approaches 1 as C n — > 00. We next show that linin^oo E* ||G n || 
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oo by T2. Note that £(9o,0Q,rj) depends on 77 in a Lipschitz manner. Con- 
sequently, we can bound i?s(e, Q n , L,2(P)) by the product of some constant 
and H(e,7Z n , L2(P)) in view of T3, where lZ n is as defined in the proof of 
lemma [2j By similar calculations as those performed in lemma [2j we can ob- 
tain K(8 n ,Q n ,L 2 (P)) < \n 1 ^ 2k n~ 1 /( 4k+2 \ Thus lim^^ E*\\G n \\Q n < oo, 
and (fl~5j) follows. 

Next we define V n = {£^ (6 n , 6 n , f/g A )(%)}■ Similar arguments as those 
used in the proof of lemma [2] can be directly applied to the verification of 
(|18p in this second model. By the form of £^ 3 \t,9,rj), the entropy number 
for V n is bounded above by that of T n = {F(tw + r](z) + (6 — t)ho(z)) : 
(t,8) £ V 9o ,X n J(r]) < C n , \\ril\oc < C n (l + J(i]))}. Similarly, we know 
supgfTfoVn^iCQ)) < supQHic^L^Q)) < ((1 + A~ 1 )/e) 1 / A: = o P (n). 
Moreover, the T n are uniformly bounded. This completes the proof for (|18j) . 

The proof of (|16p and (|17p follows arguments quite similar to those used in 
the proof of lemma[2l In other words, we can show that G n (£(#o> 6n, f}§ x ) — 
£ ) = o P {l+n l /z\\e n -e4) = o P (l) ajidG n (£ tt0 (9 o J n ,fj §njX J-£ tt e(9 o ,9 o , Vo )) = 
op(l + n l /^\\6 n — #o||)- This concludes the proof.D 

Proof of lemma\^ The proof of lemma [5] is analogous to that of lemma On 

Lemma 6. Assuming the assumptions in theorem^ we have 
(61) logpl Xn (6 n ) = logpl Xn (9 ) + n(9 n -9 ) T FJ 

- \&n - e o ) T i o (0n - e ) + o P (g Xn (\\0n - 6xJ)), 

for any 9 n = 9 + o P (l). 

Proof, n^ 1 (log pi \ n (6 n ) — logpl\ n (9o)) is bounded above and below by 

¥ n (£(e n ,e n ,m n ,xJ -mA,m n ,xJ) - ^l(J\% n ,xJ - ^o(4,%„, A J)) 
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and 

F n (£(9 ni 9 ,f,g oM ) - £(9 ,9 ,f, doM )) - ^X 2 n (J 2 (r, §n (e ,f)e ,xJ) - J 2 (w ,aJ), 

respectively. By the third order Taylor expansion of 9 n i— ► ¥ n £(9 n ,9,7]) 
around 9q, for 9 = 9 n and rj = fjg x , and the above empirical no-bias 
conditions (|19p and (|20|) . we can find that the order of the difference be- 
tween F n (£0 n ,e n ,fj §nt} J - £(9 ,e n ,fj §nXn )) and (9 n - 9 ) T ¥ n £ - {9 n - 
9 ) T (I /2)(9 n -9 ) is Op{n- l g Xn {\\0 n -9 Xn \\)). By the inequality J 2 ( V t(0,v)) < 
2J 2 (r ] ) + 2(9-t) 2 J 2 (h ), we know that Xl(J 2 (Vg nt J-J 2 MO n ,f)g n<Xn ))) = 
Op{\\9 n — 9\ n || + X n ) 2 provided assumptions ([3]) and (fTUj) hold. Similar anal- 
ysis also applies to the lower bound. This proves (|6ip .D 
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